---
output:
  pdf_document:
    citation_package: natbib
    keep_tex: false
    fig_caption: true
    latex_engine: pdflatex
    template: header.tex
title: "No Longer Conforming to Stereotypes? Gender, Political Style, and Parliamentary Debate in the UK"
author: 
- name: Lotte Hargrave
  affiliation: University College London
- name: Jack Blumenau
  affiliation: University College London
abstract: "Research on political style suggests that where women make arguments that are more emotional, empathetic, and positive, men use language that is more analytical, aggressive, and complex. However, existing work does not consider how gendered patterns of style vary over time. Focusing on the UK, we argue that pressures for female politicians to conform to stereotypically 'feminine' styles have diminished in recent years. To test this argument, we describe novel quantitative text analysis approaches for measuring a diverse set of styles at scale in political speech data. Analysing UK parliamentary debates between 1997 and 2019, we show that female MPs' debating styles have changed substantially over time, as women in parliament have increasingly adopted stylistic traits that are typically associated with 'masculine' stereotypes of communication. Our findings imply that prominent gender-based stereotypes of politicians' behaviour are significantly worse descriptors of empirical reality now than they were in the past." # 149 words
link-citations: yes
always_allow_html: true
geometry: margin=1in
fontfamily: mathpazo
fontsize: 12pt
bibliography: test
biblio-style: apsr

--- 

\noindent \textbf{Keywords:} gender; legislative politics; debate; style; stereotypes; text-as-data

\begin{center}
Word count: 9974
\end{center}

\thispagestyle{empty}
\newpage

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE, fig.pos= "h")
library(data.table)

# Load data

load("working/speech_scores.Rdata")
load("working/r_squared_time.Rdata")

speech_scores[,keep:=.N > 4, by = section_id]
speech_scores <- speech_scores[keep == TRUE]
speech_scores <- speech_scores[speech_scores$parliamentary_term != "1992-1997"]

## R squared figures

early_r2 <- colMeans(r_squared_time[1:10,])
late_r2 <- colMeans(r_squared_time[11:20,])

```


\newpage

# Supplementary Material 

Online appendixes are available at [INSERT LINK HERE ONCE READY]. 

# Data Availability Statement 

Replication data for this article can be found in Harvard Dataverse at [https://doi.org/10.7910/DVN/PPSFLT](https://doi.org/10.7910/DVN/PPSFLT). 

# Acknowledgements 

We thank Lucy Barnes, Jennifer Hudson, Markus Kollberg, Tone Langengen, Ben Lauderdale, Rebecca McKee, Alice Moore, Tom O'Grady, Meg Russell, Jess Smith, and Sigrid Weber for helpful conversations and insightful comments. We are also grateful to participants at working group seminars at University College London, PSA Polmeth 2020, and the EGEN Summer Working Group 2020. Finally, we thank Agnes Magyar and Alicia O'Malley for excellent research assistance.

# Financial Support

None. 

# Competing Interets 

None. 

\newpage

\doublespacing

# Introduction

Have incentives for politicians to conform to gender stereotypes diminished over time? In addition to the fact that female and male politicians speak about systematically different sets of political issues [@Back2019a; @Catalano2009], another dimension on which gendered differences are said to arise is regarding argumentation style. Gendered communication styles are thought to be rooted in stereotypes that create social expectations for women to act "like women" and men "like men" [@Eagly2012]. If politicians internalise these expectations before entering politics, or if voters punish them for contravening gender stereotypes [@Bauer2015; @Boussalis2020; @Cassese2018], legislators are likely to engage in gender-role consistent behaviour, and we should expect systematic differences in the political styles that female and male politicians adopt. Empirical evidence supports this view: compared to their male colleagues, female politicians' speeches are more emotional [@Dietrich2019b], less complex and jargonistic [@Coates2015], less repetitive [@Childs2004b], less aggressive [@Kathlene1994], and use different types of evidence to support their arguments [@Hargrave2020].

We contribute to this literature by evaluating the degree to which gendered differences in political style vary over time. Consistent with work that views stereotypes as dynamic constructs [@Eagly2012; @Diekman2000a], we argue that over the past 25 years in the UK, where we situate our study, several factors are likely to have decreased the degree to which UK MPs (and especially women) will conform to stereotype-consistent behaviours in parliament. First, politicians are drawn from a broader population which has itself diverged from stereotypical communicative styles in recent years. Second, changes to the social roles played by women in public life, and in politics, have reduced the validity of gender stereotypes in the eyes of the public. Consequently, we argue that voters are less likely to sanction female legislators for gender-incongruent behaviours now than in the past. Finally, the increased prominence of women in parliament and leadership roles is also likely to reduce the degree to which female politicians internalise expectations that they need to behave in "feminised" ways. Together, these arguments lead to a central behavioural prediction which we test empirically: that UK MPs will conform less to gender stereotypes now than in the past, and that women in particular will adopt styles that are further from feminine stereotypes over time.

To evaluate this expectation, we examine politicians' styles as they manifest in one prominent legislative activity: parliamentary debates. We conceive of *debating* style as a characteristic of speech which is distinct from its content. Intuitively, the style of a speech reflects the manner in which an argument is delivered. In social psychology, women's "communal" styles are thought to be marked by higher levels of emotionality, positivity, empathy, and warmth, while men's "agentic" styles are thought to be marked by higher levels of aggression, logic, and confidence [@Eagly2012; @Schneider2019]. In political science, these concepts have been operationalised using a diverse set of indicators. We survey both literatures and identify eight styles that reflect the ideas of communality and agency, and are also -- in principle -- detectable in the speeches politicians deliver. The eight styles which we use as the basis of our empirical analysis are human narrative, affect, positive emotion, negative emotion, factual language, aggression, complex language, and repetition. 

In addition to our substantive argument, our paper also provides a methodological contribution to the measurement of style in legislative settings [@Boussalis2020; @Dietrich2019b]. Our goal is to construct measures that closely approximate the conceptual definitions of each of the styles that we highlight in our review of the literature. For some styles, we use existing quantitative text analysis measures that have been extensively validated in other settings [e.g. @Kincaid1975]. For others, we develop new measures that combine traditional dictionary approaches with a word-embedding model. Our strategy overcomes limitations of standard dictionary approaches as it enables us to detect styles as they manifest *in the specific context of parliamentary debate*. Evidence from a human validation task shows that our measures significantly outperform standard measures that have been used extensively in previous research on political style.

We apply our measures to nearly half a million speeches delivered in the House of Commons between 1997 and 2019 and report three main findings. First, in the early parts of our study period, we document patterns of style which are broadly consistent with expectations from the literature on gender stereotypes. Male MPs' speeches are marked by substantially higher levels of aggression and complexity, while female MPs' speeches are considerably more emotional, positive, and make greater use of human narrative. Second, and crucially, we find that these differences have reduced dramatically in recent years. In six out of eight styles, MP gender explains less variation in style use between individuals at the end of our study period than at the beginning. For cases where we report *diverging* behaviour over time, these stylistic shifts run counter to gender-role expectations. Third, we show that the evolving variation in style use has primarily resulted from women's decreasing use of communal styles, and increasing use of agentic styles, over time.

Our work builds upon a large literature (cited above) on gender differences in politicians' communication styles, the vast majority of which considers whether gender stereotypes accurately capture real behavioural differences at fixed points in time. By contrast, we show that the descriptive validity of prominent stereotypes of how men and women communicate is considerably lower in the contemporary House of Commons than it was in the past. 

Our findings also have important implications beyond scholarly accounts of legislative politics. Prescriptive stereotypes for how men and women *ought* to behave are ultimately rooted in our collective understanding of how we expect men and women *will* behave [@Eagly2012]. In politics, these prescriptions can form the basis of voter judgements about the behaviour of male and female politicians. Documenting when behavioural shifts run counter to gender-based stereotypes is important, then, because it potentially undermines these prescriptions, and could thereby diminish the degree to which women will be subject to penalties for failing to conform to stereotypical expectations [@Cassese2018; @Ditonto2017].

Additionally, our results cast doubt on the idea that the election of more women into office will automatically result in a less adversarial and more deliberative culture in Westminster.[^culture_example] At least in the context of parliamentary debate, our findings suggest that such effects are unlikely to materialise because they are based on an outdated assumption about the distinctiveness of female MPs' political styles. In addition, by placing hopes of cultural change on newly elected women, proponents of these views may also be setting female politicians up to fail if their increased presence does not result in a "better" political culture. To the extent that cultural change of this sort is a desideratum of modern parliamentary politics, our results suggest that hopes of affecting such change should not rely on the presumption that newly elected women will conform to anachronistic stereotypes, and that purposive reforms to parliamentary practices may instead be necessary.

[^culture_example]: See, for example, [Designing a new parliament with women in mind](https://www.democraticaudit.com/2016/07/29/designing-a-new-parliament-with-women-in-mind/), *Democratic Audit*, 29th July 2016. 

# Gender, stereotypes, and debating style

Why might men and women in parliament employ different debating styles? Gender role theory [@Eagly2002] suggests that gender stereotypes concerning the typical conduct of men and women can affect behaviour via two main channels. First, repeated exposure to stereotypes from a young age may lead men and women to *internalise* expectations relevant to their genders, which then become self-imposed standards against which they regulate their own behaviour [@Eagly2012]. Second, descriptive stereotypes (e.g., the perceived tendency for women to be emotional) are often thought to lead to prescriptive stereotypes (e.g., the view that women *should* be emotional), the violation of which leads to the imposition of *social sanctions* by others which further incentivise conformity with gender-based norms [@Brescoll2008].

Female politicians are especially subject to pressures to conform to role-consistent behavioural standards, as voters punish women for displaying behaviour that counters feminine stereotypes. Voters form gender-biased impressions of candidates [@Bauer2015], and penalise women for appearing to be too ambitious [@Okimoto2010] or negative [@Cassese2018], while rewarding them for displays of happiness [@Boussalis2020]. These penalties are more acute when the campaign environment is characterised by "masculine" issues [@Holman2016], and are more commonly applied by low-attention voters [@Bauer2015a] and voters with sexist attitudes [@Mo2015] or aggressive personalities [@Bauer2021]. By contrast, voters are less sensitive to role-inconsistent behaviour by male politicians [@Okimoto2010]. 

However, as political leaders, legislators are also expected to display behaviours consistent with *leadership* stereotypes. Men's historical occupation of leadership positions means that leadership stereotypes have been shaped by traditional "masculine" traits such as being assertive, competitive, and outgoing [@Koenig2011]. The congruence between leadership stereotypes and masculine stereotypes therefore poses little challenge for male politicians to conform to both sets of expectations. By contrast, if women seek to conform to *leadership* stereotypes, they may risk incurring penalties for violating *feminine* stereotypes [@Bauer2017; @Eagly2002]. Women must therefore attempt to balance a complicated array of behavioural expectations in a way that men do not. 

Political scientists have evaluated whether these incentives induce male and female politicians to adopt systematically different political styles. We focus on one aspect of legislative activity where differences are likely to become manifest: in political speech. On which dimensions of style should we expect gender differences? Of central concern in the social psychology literature is the distinction between *communal* characteristics of style, which are associated with women, and *agentic* characteristics which are associated with men [@Schneider2019]. These labels are heuristics for clusters of behavioural attributes, where communal characteristics are said to include being "affectionate, helpful, kind, sympathetic, interpersonally sensitive, nurturant, and gentle", while agentic characteristics include being "aggressive, ambitious, dominant, forceful, independent, self-sufficient, and self-confident" [@Eagly2002, 574]. By surveying the large empirical literature on gendered styles in political science, we identified eight styles that are representative of "communal" or "agentic" behaviour and which previous work had either shown to be associated with male or female use in politics, or that previous work had *expected* to be associated with gender differences. We use these styles as the basis of our empirical analysis below.

We identified three "communal" characteristics of style that are typically associated with women. First, women are said to make greater use of **human narrative** through reliance on personal experience, analogies, and anecdotes in their speeches [@Blankenship1995]. This idea is supported both by politicians' testimonies [@Childs2004b], and qualitative studies of political speech [@Hargrave2020]. Second, women are also thought to make greater use of emotional language or **affect** [@Huddy1993], and there is clear evidence that women's language exhibits greater overall emotionality than men's [@Dietrich2019b; @Jones2016]. Third, and more specifically, women have been found to use more **positive emotion**, such as expressing happiness, in their political speeches than men [@Boussalis2020; @Yu2013].

We identified five "agentic" characteristics of style that are typically associated with men. First, men are thought to rely more on **fact-based** language, which is more "analytical, organised and impersonal" and relies more on statistical evidence [@Jamieson1988, 76]. In the UK, MPs suggest that male politicians pay greater attention to "scientific research" [@Childs2004b, 181], though in other settings there is evidence that women may use more factual language [@Hargrave2020].

Second, male politicians' speech is also thought to feature higher levels of linguistic **complexity**, marked by formalistic and jargonistic word use [@Childs2004b], while women are thought to be more accessible and clear [@Coates2015]. Third, men are also thought to be more **repetitive** [@Dahlerup1988;@Childs2004b,184], and, fourth, more **aggressive**, whereas women are said to avoid combative and aggressive styles [@Brescoll2008; @Kathlene1994], and empirical work suggests women are significantly less adversarial than men in parliamentary debate [@Grey2002; @Hargrave2020]. Fifth, women are thought to avoid the use of excess **negative emotion** for fear of backlash [@Cassese2018], while men are thought to make greater use of negativity [@Brooks2011].

We summarise these eight styles in table \ref{tab:definitions_table}. We provide a short definition and categorise each as either "communal" or "agentic" according to our discussion above. The expectations that derive from the literature on gendered stereotypes suggests that women will be more likely to use communal debating styles, and men more likely to use agentic debating styles. 


\singlespacing

\footnotesize
\vfill

\begin{longtable}{{l}p{0.15\textwidth}p{0.60\textwidth}}
\caption{Political styles}
\\
\toprule
Style & Type & Definition \\\midrule
Human Narrative & Communal & Use of personal examples or experiences; stories of other people; constituency stories; illustrative examples; referring to individuals.\\
& & \\
Affect & Communal & Use of emotive language, which might be either positive or negative; such as expressing criticism, praise, disapproval, pride, empathy or fear. \\ 
& & \\
Positive Emotion & Communal & Use of positive emotional language, which might include expressing empathy, praise, celebration or congratulations. \\
& & \\
Fact & Agentic & Use of numbers, statistics, numerical quantifiers, figures and empirical evidence. \\
& & \\
Complexity & Agentic & Use of jargonistic, complicated and elaborate language that is challenging to understand. \\
& & \\
Repetition & Agentic & Repeated use of the same words or phrases. \\
& & \\
Aggression & Agentic & Use of aggressive or combative language, which might include criticisms or insults; language that suggests forceful action; or declamatory or adversarial language. \\
& & \\
Negative Emotion & Agentic & Use of negative emotional language, which might include expressing fear, anxiety, unpleasantness, sadness or disapproval. \\
\bottomrule
\label{tab:definitions_table}
\end{longtable}

\normalsize
\doublespacing

## Dynamic gender stereotypes 

Despite this rich literature, few studies consider whether politicians' conformity with gender stereotypes has changed over time.[^other_time_lit] This is surprising, as gender role theorists emphasise that the content and strength of stereotypes are dynamic [@Diekman2000a; @Eagly2012]. These accounts posit that gender stereotypes arise from men and women's historical occupation of different social roles which are associated with different characteristics. For instance, because women have traditionally occupied roles in which they provide care to others, "caring" as a characteristic became stereotypic of women. However, as the distribution of men and women into different roles changes, so too will the characteristics associated with the stereotypes themselves such that the stereotype of women will be marked by "increasing masculinity and... decreasing femininity" [@Diekman2000a, 1173]. Building on this logic, we argue that recent changes in the roles played by women in both politics and the broader public are likely to have weakened traditional gender stereotypes in the UK, and we therefore expect a decline in the degree to which MPs, and especially women, will adopt styles that are congruent with the stereotypes described above. 

[^other_time_lit]: Though see @Jones2016 for a case study of the evolution of Hillary Clinton's style, and @Grey2002 who demonstrates that female MPs in New Zealand are increasingly aggressive over time. 

First, politicians are selected from a broader population, which has itself diverged from gender-stereotypical behaviours over time. In most advanced economies in recent decades, women's traditional role as care-givers has declined, and women's educational attainment, participation in the workforce, and occupancy of senior management positions have increased [@Sayer2004; @Diekman2006,370]. As societal gender roles have changed, women in the public have come to demonstrate increasingly agentic behaviours across a wide set of contexts and countries [@Twenge2001; @Leaper2007, 357]. As politicians are likely to reflect the characteristics of the population from which they are drawn, if women in the UK are now more agentic on average, we should expect these changes to be reflected in the behaviour of politicians too.

Second, changing social roles have affected public perceptions of the validity of traditional gender stereotypes. Women in general are perceived as more agentic now than in the past [@Eagly2020; @Senden2019], and while attributes associated with men have remained relatively stable, masculine characteristics are increasingly ascribed to women [@Diekman2000a]. Do changes in attitudes regarding the content and validity of stereotypes mean that voters are less likely to punish counter-stereotypic behaviours? As we reported above, several papers document voters' tendency to punish female candidates for displaying agentic traits. However, we are not aware of any existing empirical literature which tracks the extent to which politicians are punished by voters for contravening stereotypes *over time*. Several more recent studies suggest that voters do not *always* punish female politicians for violating feminine stereotypes [@Brooks2011; @DeGeus2020; @Saha2020], but these papers again only provide evidence from a single point in time. 

There is, however, evidence that the public have become less likely to endorse traditional gender stereotypes over time. As women's position in the labour market has improved, support for traditional gender norms and associated stereotypes has eroded both in the UK and further afield [@inglehart2003rising; @twenge1997attitudes; @seguino2007pluscca]. In the UK, voters have come to hold substantially more gender-egalitarian attitudes between the mid-1980s and the present [@Taylor2018]. Further, between 1990 and 2010, voters in Western Europe, including in the UK, have become significantly less likely to agree with the traditional division of social roles performed by men and women [@Shorrocks2018]. This latter finding is particularly relevant given that it is the association of men and women with particular social roles that is at the heart of theories of gender stereotypes [@Eagly2002]. It therefore seems likely that as voters have become less willing to endorse gender stereotypes, they also will apply fewer sanctions to politicians who transgress such stereotypes. Consequently, as @Mo2015[360] argues, "gender attitudes in the electoral process remain consequential, but have grown subtler". To the extent that politicians in the UK are sensitive to the expectations of voters, then, changing voter attitudes about stereotypes will likely have reduced pressures on female politicians to conform to gender-stereotypic behaviours over time. 

Third, the dramatic shifts in the roles that women play in *political* life in recent decades might also reduce the degree to which female politicians conform to traditional gender stereotypes. In the House of Commons, women held just 18% of seats in 1997, but this increased to 32% by 2019 [@IPU2020]. Moreover, female politicians in the UK now occupy more high-powered positions within the legislative hierarchy [@Blumenau2019b]. As women enter politics at a higher rate, role theory predicts that female politicians will come to be seen as possessing more masculine characteristics [@diekman2005dynamic], and the increasing prevalence of women in leadership has been shown to reduce the degree to which communal qualities are ascribed to women [@Dasgupta2004]. As @diekman2005dynamic[212] argue, "women’s increased representation as elected officials and government employees should foster the ascription to women of traditionally masculine qualities." 

Accordingly, in addition to a general tendency for stereotypes of women to become more oriented towards agentic characteristics in recent years, female politicians *specifically* may have become associated with more masculine characteristics over time as they have become more numerous and more powerful in the (historically male) political domain. These shifts in the political sphere might help to further reduce voter sanctions against female politicians who adopt agentic styles, but they are also likely to reduce the degree to which women in parliament *internalise* expectations of feminine behaviour. That is, as female politicians witness more examples of women in politics adopting more agentic and less communal styles, this may weaken the self-imposed standards of femininity that are typically seen as the internal drivers of stereotype-consistent behaviour [@Eagly2012]. As female politicians become increasingly associated with forms of behaviour normally ascribed to men, the incentives for them to conform to more traditional feminine styles should be expected to decrease.

In our empirical analysis we do not attempt to disentangle which of the three mechanisms outlined here, or others, may be responsible for changes in parliamentary behaviour. Rather, we aim to test a central prediction that emerges from our discussion of dynamic gender stereotypes: that MPs will conform less to gender-stereotypic styles in recent years than was true in the past, and that women in particular will be more likely to adopt agentic rather than communal styles over time. 

## Pressures to conform to institutional behavioural norms

In this section, we contrast the predictions of our argument with expectations generated by theories of feminist institutionalism. These perspectives hold that, as historically male-dominated institutions, legislatures are gendered spaces that maintain, favour and recreate traditional masculine behaviours [@Hawkesworth2003; @Krook2011]. While work in this literature does not always articulate clear predictions regarding the dynamics of gendered behaviour over time, implicit in these arguments is the idea that the pressure for women to conform to the prevailing (male) institutional style will be strongest when women are more marginalised in the legislature. When this is the case, as @Franceschet2011[66] argues, "women may respond by disavowing distinctly feminine (and feminist) concerns, instead favouring the style and substantive issues of the dominant group." By contrast, as women gain higher levels of representation and more political power, the culture of parliament will change to be more "conducive to women acting in a feminized way" [@Childs2004b, 187].

The implication of this argument in our setting is that the pressure on women to conform to the dominant "masculine" institutional style will be strongest at the beginning of our study period (in the late 1990s) when women's representation in the Commons was at lower levels, but that it should weaken over time. In addition to the increasing number of women in parliament and in leadership positions, during the period we study (from 1997 to 2019) the House of Commons also introduced a series of reforms designed to strengthen the position of women within the legislature.[^add_examples] Therefore, while the Commons remains majority male, institutionalist perspectives predict that changes in composition and working practices will mean that women will be better able to "perform their tasks as politicians the way they individually prefer" [@Dahlerup2006, 519]. Therefore, while our argument suggests that women are likely to respond to changing gender stereotypes by adopting more *agentic* styles over time, the institutionalism argument suggests that women are likely to adopt increasingly *communal* styles as they become more numerous and powerful in parliament. The empirical strategy we outline below allows us to adjudicate between these contrasting predictions.

[^add_examples]: For example, this period includes the establishment of the Women and Equalities Committee, the introduction of the Speaker's Reference Group on Representation and Inclusion, as well as the introduction of initiatives such as proxy voting for MPs on baby leave.

# Data and Methodology 

We consider the words of politicians' speeches as the primary locus of debating style, and we use texts of political speeches delivered in parliament to infer the styles adopted by different speakers. Parliamentary speech is a useful source of information for measuring style as it provides long-running panel data at the individual level. In the UK, MPs are afforded a large degree of autonomy regarding the debates to which they contribute, and party leaders exert no control over who participates, nor over the content of speeches that MPs deliver.

We study House of Commons debates between May 1997 and March 2019. Our study period is motivated by the fact that prior to the 1997 election women accounted for less than 10% of MPs, and so analysis of earlier periods would likely be sensitive to the styles of only few specific women. We collapse our data such that all speeches made by an MP in a debate constitute a single speech-document, making our unit of analysis an individual MP in a debate. We remove all speech-documents shorter than 50 words, as well as contributions by the Speaker of the House, whose speeches are almost entirely procedural. We also exclude any debate that has fewer than five participants.[^short_debates] Our final sample consists of `r format(length(unique(speech_scores$section_id)),big.mark=",")` debates, `r format(length(unique(speech_scores$person_id)),big.mark=",")` MPs (`r format(length(unique(speech_scores$person_id[speech_scores$gender=="Female"])),big.mark=",")` female, `r format(length(unique(speech_scores$person_id[speech_scores$gender=="Male"])),big.mark=",")` male), and `r format(nrow(speech_scores),big.mark=",")` MP-debate observations. 

[^short_debates]: Our model becomes computationally burdensome with very large numbers of debates. Small debates contribute little to our estimates given the random-effect structure of the model described below, and so our results are very unlikely to be sensitive to this decision.

## Measuring "style" with context-specific dictionaries

A common approach to measuring latent concepts, such as style, in text data is to assign each text a score based on a predefined dictionary that aims to capture the concept of interest. However, dictionary-based approaches are highly domain-specific, as the words used to capture a concept in one context -- say, parliamentary speeches -- are likely to be different to those used to express the same concept in another context. We propose an alternative approach that combines standard dictionaries with a locally-trained word-embedding model to construct domain-specific dictionaries that are better able to capture our style types *as they manifest in the context of parliamentary debate*. The key advantage of our approach is that it allows us to account for context-specific patterns of word use. That is, rather than simply using an off-the-shelf dictionary that may be poorly suited to capturing, for instance, aggression in the parliamentary setting, this approach allows us to automatically create a bespoke aggression dictionary which is firmly rooted in the way that vocabulary is used in parliamentary debate. We use this approach to measure six of our styles: aggression, affect, positive emotion, negative emotion, fact, and human narrative.

For each style, we follow three steps to construct the relevant score for each speech. First, we define a "seed" dictionary that represents our concept of interest. For four styles (affect, negative emotion, positive emotion, and fact), we use existing dictionaries and for two styles (aggression and human narrative) we create our own seed dictionaries based on a close reading of a sample of parliamentary texts.[^seed_dictionaries]

[^seed_dictionaries]: We include a full description of the seed dictionaries in the appendix.

Second, we estimate a set of word-embeddings using the GloVe model described by @Pennington2014. Word-embedding models rely on the idea that words which are used in similar contexts will have similar meanings, and the embedding model allows us to *learn* the semantic meaning of each word directly from how the word is used by MPs in debate. We train the embedding model on the full set of parliamentary speeches, and the main output of the model is the set of word-embeddings themselves. These are dense vectors that correspond to each unique word in the corpus, the dimensions of which capture the semantic "meanings" of the words. Crucially for our purposes, the distances *between* word-vectors have been shown to effectively capture important semantic similarities between different words [@Mikolov2013].

By calculating the cosine similarity between every word in the corpus and the words in each of our seed dictionaries, we can therefore use this property to define the set of words that, *in the specific context of parliamentary debate*, are used in a semantically similar fashion to the seed words. We label this quantity as $Sim_{w}^s$, where $w$ indexes words, and $s$ indexes each style. Words closely related to the average semantic meaning of the seed words for a given dictionary will have a high similarity score (close to 1), and words that are less closely related will have a low similarity score (close to 0). The $Sim_{w}^s$ scores therefore define a domain-specific dictionary for a given style type. They describe the degree to which each unique word in our corpus is used similarly to the ways in which the words in our seed dictionary are used, on average. In essence, the embeddings enable our seed dictionaries to automatically expand to incorporate words that are used in a similar manner to the words that they already include. We provide full details of our approach, and an extensive set of validation checks, in the appendix. 

Third, we use the word-level scores, $Sim_{w}^s$, to score each *sentence* in the corpus on each style according to the words they contain. In particular, the score for a given sentence on a given style is: 

\begin{equation} \label{sim_eq}
Score_{i}^s = \frac{\sum_w^W Sim_{w}^s N_{wi}}{\sum_w^W N_{wi}}
\end{equation}

\noindent where $Sim_{w}^s$ is the similarity score defined above, and $N_{wi}$ is the (weighted) number of times that word $w$ appears in sentence $i$, where the weights are term-frequency inverse-document-frequency weights.[^tf_idf] When words with high scores for a given style appear frequently in a given sentence, the sentence will be scored as highly relevant to the style. The score for each *document* is then the weighted average of the relevant sentence-level scores, where the weights are equal to the number of words in each sentence. 

[^tf_idf]: TF-IDF weighting is used to down-weight very common words, and up-weight relatively rare words.

The approach we outline here is similar to that developed in @Rice2019, who also use a word-embedding model to create context-specific dictionaries. We build on this work by addressing the question of whether word-embedding dictionary construction "[yields] valid dictionaries for widely-varying types of specialised vocabularies" [@Rice2019, 34]. We extend the idea of word-embedding based dictionaries to a new setting -- the UK House of Commons -- and to six new specialised vocabularies. As our validation exercises in the appendix show, there is strong evidence that our approach significantly outperforms standard dictionary approaches across the set of styles we study.

## Measuring "complexity" and "repetition" 

For our final two styles -- complexity and repetition -- dictionary approaches (domain-specific or otherwise) are unsuitable, as these styles are not detectable from the occurrence of specific words. Instead, we adopt two different metrics to capture these concepts. For *complexity*, we use the Flesch-Kincaid Readability Score [@Kincaid1975]. The intuition behind this measure is that documents that have fewer words per sentences, and fewer syllables per word, are easier to understand (more "readable"). We rescale the original formulation of the score such that higher numbers indicate higher levels of complexity. While @Benoit2019[501] show that domain-specific measures of textual complexity have some performance gains over the Flesch-Kincaid score, they also demonstrate that this metric correlates highly with more sophisticated measures. We opt for the simpler metric here because our own validation demonstrates that this measure performs well in comparisons with human judgements in our setting. 

We consider MPs to be *repetitive* when they use the same language repeatedly during a debate. To measure repetition, we use a lossless text-compression algorithm introduced by @Ziv1977, which underpins a variety of common computer applications. Compression algorithms work by finding repeated sequences of text and using those patterns to reduce the overall size of the input document. The efficiency of the compression of a text is directly related to the number and length of the repeated sections in that text. We apply the compression algorithm to every document in our corpus, measure the degree of compression, and treat that quantity as the measure of repetition for each MP in each debate. Simply put, the more compression that a speech receives, the more repetitive we deem it to be.[^rep_alternative] 

[^rep_alternative]: An alternative measure of repetition for a given text, $j$, might be $\frac{\text{\# Words}_j}{\text{\# Unique Words}_j}$, which captures the intuition that texts with a smaller fraction of unique words are likely to be more repetitive. Although it correlates highly with our measure ($\rho = 0.71$), this metric is likely to underestimate the degree of repetitiveness in instances where long sequences of words are repeated, but where those sequences are themselves constituted of many unique words.

Finally, to put our eight style measures on comparable scales, we normalise each measure across documents to have mean zero and standard deviation one. This means that average differences for each style between men and women can be interpreted in standard deviations of the outcome variable. 

In the appendix, we provide extensive validation of our measures. In addition to a wide range of face validity checks, we provide results from a task which assesses whether our measures mirror human codings of the same concepts. The results are very encouraging: across all styles, the correlation between our text-based scores and human judgements is always strongly-positive, indicating that we are able to reliably detect our styles of interest in parliamentary speech. In addition, our word-embedding measures clearly predict human codings more strongly than do measures based on standard dictionary approaches, which have been used in previous studies on gender and political style.[^justify_outperforms] 

[^justify_outperforms]: In particular, in appendix table S4 we show that the correlation between our measures and human codings is substantially higher than the correlation between human codings and a more standard dictionary measure which uses the proportion of words in each sentence that appear in each of our seed dictionaries.

# Modelling political style

Our goal is to assess the degree to which style use varies by MP gender, and whether such differences change over time. To investigate these patterns, we adopt a Bayesian dynamic hierarchical model that allows us to account for a wide variety of both individual- and topic-level confounders (described below), while also flexibly estimating changing gender dynamics in style over time. 

For each speech $i$, we have a continuous measurement of style $s$, which we denote as $y_i^s$. For speech $i$, by MP $j$, in debate $d$, and time period $t$, we model the data as a function of individual- and debate-level parameters:
\begin{eqnarray}\label{eq:model2:1st}
y_{i(jdt)}^s \sim N(\alpha_{j,t} + \delta_d, \sigma_y)
\end{eqnarray}

\noindent where $\alpha_{j,t}$ is an MP-specific random effect which captures average differences in MP style use. The $t$ subscript indicates that we fit one intercept for each MP in each time period that they appear in the data, thus allowing us to capture average style use at different points in time. We use parliamentary sessions as our unit of time, of which there were `r length(unique(speech_scores$session))` between 1997 and 2019. We observe speeches from `r round(mean(speech_scores[,list(n_mps = length(unique(person_id))),by = session]$n_mps))` MPs on average in each session, and each MP appears in `r round(mean(speech_scores[,list(n_mps = length(unique(session))),by = person_id]$n_mps))` sessions on average. The $\delta_d$ parameters are random effects which capture average differences in style use in different debates.  

Our primary interest is in describing variation in the $\alpha_{j,t}$ parameters. We model these random-effects at the second level of the model as a function of MP gender, while allowing the relationship between gender and style-use to vary over time:
\begin{eqnarray}\label{eq:model2:2nd}
\alpha_{j,t} \sim N(\mu_{0,t} + \mu_{1,t}Female_j, \sigma_{\alpha})
\end{eqnarray}

Here, $\mu_{0,t}$ represents the average use of a style among male MPs in time period $t$, and $\mu_{1,t}$ describes the average difference in style use for women relative to men, again in time period $t$. The standard deviation $\sigma_{\alpha}$, describes how much, on average, the MP-session intercepts vary around the mean style use for MPs of each gender.\footnote{We use a common variance parameter for all time periods.} Gender differences in one parliamentary session are not independent of those in previous sessions, and in order to reflect a more realistic evolution of these differences we model the $\mu_{0,t}$ and $\mu_{1,t}$ parameters as a first-order random-walk process:
\begin{eqnarray}
\mu_{0,t} \sim N(\mu_{0,t-1}, \sigma_{\mu_0}) \nonumber \\
\mu_{1,t} \sim N(\mu_{1,t-1}, \sigma_{\mu_1})
\end{eqnarray}

This specification assumes that the average use of a style by women and men will be similar in $t$ and $t+1$, and that changes over time will therefore occur gradually. This encourages smooth coefficient changes over time, but still allows for large deviations from one period to the next if the information from the data is sufficiently strong. 

The $\mu_{0,t}$ and $\mu_{1,t}$ parameters are our main quantities of interest. $\mu_{1,t}$ captures the difference in average style use between genders in each time period, and our review of the theoretical literature implies general expectations for the *sign* of $\mu_{1,t}$ for each style (see table \ref{tab:definitions_table}). Consistent with our theoretical discussion of how the incentives for conforming with gender stereotypes have changed in recent years, we also expect the *magnitude* of $\mu_{1,t}$ for each style to decrease over time, and for those changes to be driven mostly by changes to the average behaviour of female MPs (which, for each year $t$, is captured by $\mu_{0,t} + \mu_{1,t}$). We report both quantities below.

Our model allows us to account for individual-level confounders by including a set of MP-specific covariates into the model. To do so, in some specifications we replace equation \ref{eq:model2:2nd} with:
\begin{eqnarray}\label{eq:model2:2nd:control}
\alpha_{j,t} \sim N(\mu_{0,t} + \mu_{1,t} Female_j + \sum_{k = 1}^k \lambda_k X_{j,t}^k, \sigma_{\alpha})
\end{eqnarray}

\noindent where $X_{j,t}$ is a vector of individual-level covariates which can vary by session.[^model_for_delta]

We include several such controls. First, MPs in leadership positions may use systematically different styles than backbench MPs, and women have come to occupy a greater share of legislative leadership roles over our study period [@Blumenau2019b]. We therefore control for whether the MP held a frontbench position for either the government or opposition in each session, and whether they were a committee chair.

Second, if MPs from different parties use styles at different rates, then any change we observe in gendered use of styles might be confounded by the fact that proportionally more Conservative Party female MPs have been elected to parliament in recent years. We therefore also include a set of party dummies.

Third, opposition MPs use significantly more negative language than government MPs [@Proksch2019] and, because the Labour Party has proportionally more women than other parties, any increase in women's use of more agentic styles might be attributable to Labour's move into opposition in 2010. To address this possibility, we control for whether an MP is a member of a governing or an opposition party in each time period. 

Fourth, we also add controls for MPs' occupational background and educational attainment. The professional and educational backgrounds of MPs have changed over time [@lamprinakou2017all], and it is plausible that these characteristics will be associated with both speechmaking styles and gender. 

Finally, MPs' local electoral environment might affect language use. For instance, MPs in more competitive seats might be more likely to use human narrative to emphasise constituents' concerns. If there have been changes in the relative competitiveness of seats won by men and women over time, this could confound the differences we observe in gendered-language use. We therefore control for the percentage point margin of victory of the MP in the previous election. 

We are also able to use our model to account for confounding that relates to differential usage of styles across topics. Men and women systematically participate in debates devoted to different topics [@Back2019a; @Catalano2009], and debate topic may correlate with style in ways which work to confound our inferences. For instance, if women participate more in debates on education which contain language related to human narrative, while men participate more in debates on the economy which include more factual language, then gender differences in topic usage will confound gender differences in style. However, the debate-level intercepts, $\delta_d$, mean that it is only *within-debate* variation in style use that informs the estimates of our central quantities of interest. In other words, $\mu_{1,t}$ will capture only the degree to which men and women use different styles when speaking about the same substantive topic. 

We estimate our model separately for each style in Stan [@carpenter2017stan], where we use three chains of 500 iterations, after 250 iterations of burn-in.

[^model_for_delta]: Debate intercepts are drawn from a mean-zero normal distribution, with estimated variance:
\begin{eqnarray}
\delta_d \sim N(0, \sigma_\delta)
\end{eqnarray}
Our model is completed by normal prior distributions over the $\lambda$ parameters:
\begin{eqnarray}
\lambda^k \sim N(0, 2)
\end{eqnarray}
\noindent and half-normal prior distributions over the variance-parameters:
 \begin{eqnarray}
\sigma_{\alpha}, \sigma_{\mu_0}, \sigma_{\mu_1}, \sigma_{\delta}, \sigma_y \sim N(0, 2)  
\end{eqnarray}

# Results

Figure \ref{fig:model2_time} shows the values of $\mu_{1,t}$ -- the average difference between men and women for each style type, in each parliamentary session. Positive values indicate that a style is used more by women, and negative values indicate higher use of the style by men. The grey shading indicates the expected direction of the gender effects based on previous literature (see table \ref{tab:definitions_table}). 

For five of the debating styles we study, we find that -- in the early years of our sample -- male and female speechmaking behaviour broadly conforms to stereotypes. Female MPs are more likely to draw on examples that emphasise human narrative, and to use positive and emotional language than men. Similarly, men use more aggressive and complex language, at least before 2010. Interestingly, for three of our styles -- fact, repetition, and negative emotion -- we find that debating style in the Commons does not clearly conform to the expectations of the existing literature. For all three of these agentic styles, for much of the period we study, female MPs are *more* likely than male MPs to express these styles. 


\afterpage{
\blandscape

\vspace*{\fill}
\begin{figure}
\includegraphics[width = 1.5\textwidth]{analysis/plots/model2_gender_effect_time.pdf}
\caption{Gender differences in style over time}
\label{fig:model2_time}
\end{figure}
\vfill
\elandscape
}


However, and in some sense more importantly, figure \ref{fig:model2_time} also reveals that there is significant variation in the size of these gender differences over time. Women are more likely to use "communal" style types -- affect, positive emotion, and human narrative -- in the early period in our data, but gender differences become smaller over time. For positive emotion and affect, there is no consistent significant difference between men and women by the latest years in our data. Similarly, while men use significantly more "agentic" styles -- particularly aggressive and complex language -- than women before 2010, this difference has also disappeared in recent years. These changes are non-trivial: for those styles where we see a convergence between men and women, the proportion of MP-level variation in style use explained by gender decreases by between `r round(min(c(late_r2/early_r2 * 100)[names(late_r2) %in% c("anecdote_std", "affect_std", "posemo_std", "complexity_std", "aggression_std")]))`% and `r round(max(c(late_r2/early_r2 * 100)[names(late_r2) %in% c("anecdote_std", "affect_std", "posemo_std", "complexity_std", "aggression_std")]))`%, depending on style, when comparing the periods before and after 2007.[^gelman_r_square] 

[^gelman_r_square]: To calculate these quantities we follow @Gelman2006 and describe the proportion of individual variation in style use explained by MP gender in each parliamentary session using an $R^2$-style metric:
$$R^2_{\alpha,t} = 1 - \frac{E(V_{j=1}^J \hat{\epsilon}_{j,t})}{E(V_{j=1}^J \hat{\alpha}_{j,t})}$$ 
where 
$$\hat{\epsilon}_{j,t} = \hat{\alpha}_{j,t} - \hat{\mu}_{0,t} + \hat{\mu}_{1,t} Female_j$$

Further, for other styles we observe increasing gender differences over time, but the direction of these shifts also suggest that women are becoming increasingly agentic relative to men. For instance, though there are negligible gender differences in the earlier period, from 2007 onward, women use significantly more factual language. Similarly, although women use negative emotion in their speeches at higher rates throughout the time period, this gender difference has grown substantially larger over time. Between 1997 and 2007, gender explained just `r round(early_r2[names(early_r2) == "fact_std"]*100,1)`% and `r round(early_r2[names(early_r2) == "negemo_std"]*100,0)`% of MP-level variation in factual language and negative emotion, respectively, but this increased to approximately `r round(late_r2[names(late_r2) == "fact_std"]*100,0)`% and `r round(late_r2[names(late_r2) == "negemo_std"]*100,0)`% after 2007. Accordingly, even for these agentic styles which women adopt more than men throughout the study period, it remains the case that women become *more* likely to deploy this type of language in recent years than in the past. The only style for which we document relatively stable gender differences is repetition. While women appear to become somewhat less repetitious relative to men over time, the trend for this style is less pronounced.
 
Taken together, these findings are consistent with our argument that the pressures for women to conform to stereotypes have declined over time. In general, relative to men, women demonstrate less communal (human narrative, affect, and positive emotion) and more agentic (negative emotion, aggression, fact, and complexity) styles in recent years than they did in the past. 

Are these patterns the result of changes in the behaviour of male or female MPs? Our argument implied that these changes were likely to be rooted in the behaviour of women as they respond to the changing content and power of gender stereotypes. Figure \ref{fig:model2_men_women_time}, which depicts changes in style use separately for women and men, shows that across almost all the styles we study, the largest year-to-year shifts in speechmaking behaviour do indeed occur among women. The figure shows that women have used each of our communal style types -- human narrative, affect, and positive emotion -- to a decreasing extent over time. Similarly, for the "agentic" styles of negative emotion, fact, and aggression, while men's behaviour has remained relatively stable, women's use of these styles has increased over time. While both men and women have adopted more complex language over time, the increase has been somewhat larger for women than for men. 

\afterpage{
\blandscape

\vspace*{\fill}
\begin{figure}
\includegraphics[width = 1.5\textwidth]{analysis/plots/model2_average_style_use.pdf}
\caption{Average style use for men and women over time}
\label{fig:model2_men_women_time}
\end{figure}
\vfill
\elandscape
}


## Threats to inference

We have argued that politicians' conformity to traditional stereotypes has diminished over time, but there are alternative explanations that could account for the behavioural patterns we document.

First, as we outlined above, the literature on feminist institutionalism suggests that women will face pressures to conform to institutionally-dominant, masculine behaviours that are favoured and recreated by the culture of the House. While institutional pressures of this sort are surely a feature of life in the contemporary House of Commons, for this perspective to explain our results, these pressures would need to have *strengthened* in recent years. However, during this period, women's presence increased in the Commons, female MPs came to hold more senior positions, and institutional reforms designed to strengthen women's institutional position were introduced. Consequently, if women face stronger incentives to conform to masculine styles when an institution is more male dominated, then we should observe women becoming *less* agentic over time. Our results show the opposite pattern, suggesting that institutionalist accounts are unlikely to explain the over-time dynamics in speechmaking that we document. 

Second, figures S4 and S5 in the appendix report results from the model described in equation \ref{eq:model2:2nd:control} in which we control for a host of MP-level covariates. If the changes we observe over time are driven by factors such as party, opposition status, and so on, we would expect to see large differences between these two sets of results. Although there is some attenuation of the over-time changes for aggression and complexity when controlling for covariates, we nonetheless still observe stereotype-consistent differences at the beginning of the sample period and clear evidence of women using more agentic styles over time. 

Third, the aggregate patterns we document could reflect changes to the parliamentary agenda, rather than changes in gendered behaviour. If there are certain topics on which women are more likely to demonstrate agentic styles, and these topics feature more prominently on the parliamentary agenda in the later period, then our results might be explained by changing topical prevalence over time. In the appendix, we use statistical topic-models to measure the differences between male and female style use across a wide variety of topics, and then evaluate whether topics that are marked by large stylistic differences become more or less prevalent over time relative to topics marked by smaller differences. We find scant evidence of such topical confounding.

Finally, in the appendix, we also assess whether women with more agentic styles participate more, and women with more communal styles participate less, in parliamentary debate over time. We show that differential participation does not explain our results: MPs' styles largely fail to predict debate participation throughout our study period. In addition, we also investigate whether the changes we document are due to changes in styles of men and women throughout their careers ("within-MP" effects), or because the men and women entering parliament over time are systematically different from those leaving ("replacement" effects). While there is some evidence that replacement is more important for explaining the changes in agentic styles and within-MP change is somewhat more important for explaining change in communal styles, overall we find that neither replacement nor within-MP change can alone explain the patterns that we document above. 


# Conclusion

Our central substantive contribution is to document the fact that gender stereotypes are worse descriptors for actual political behaviour in the UK now than was true in the past. In particular, in recent years, women in the House of Commons demonstrate less communal and more agentic styles, and the gender gap on most dimensions of style that we examine has decreased. We see these results as an important corrective to the scholarly literature on gender differences in legislative behaviour, which typically emphasises that male and female politicians argue in ways that are broadly consistent with stereotypes. Though this may be true in some settings, gender stereotypes of communication styles have become significantly less predictive of the reality of contemporary British political debate.

These findings do not, however, imply that gender-stereotypes play no role in UK politics. We show that recent parliamentary behaviour is poorly described by traditional stereotypes, but we do not provide empirical evidence regarding the mechanisms that led to these changes. For instance, previous work shows that the public are less likely to endorse traditional stereotypes now than in the past, but we lack data to assess whether there has been a concomitant decline in the sanctions that voters apply for gender-role-inconsistent behaviour. Anecdotally, there continue to be examples of British female politicians being criticised for stereotype-incongruent behaviour[^maybot] and stereotypes may well continue to condition voter responses. Though we think this type of sanctioning is likely to have declined because of general changes in voter attitudes regarding stereotypes, future work should focus on collecting over-time survey data on voters' attitudes towards non-stereotypical behaviour by politicians.

[^maybot]: See ["The Making of the Maybot"](https://www.spectator.co.uk/article/the-making-of-the-maybot), *Spectator*, 2nd November 2017. 


One optimistic view of our findings, however, is that there may be a virtuous circle in which female politicians diverge from stereotypical behaviours, and that this in turn changes perceptions of appropriate feminine behaviour, which thereby reduces the pressures women are under to conform to such stereotypes. Changes in the typical behaviours of male and female politicians are likely to translate only slowly into revised public expectations of the standards against which men and women MPs are judged. However, to the degree that the behavioural shifts that we document are noticed and internalised by the public, they might also help to reduce the social penalties applied to female politicians who display more agentic styles.

Our findings also have implications for wider debates about political culture in the UK. There is a strand of popular commentary that implies some of the more unattractive features of Westminster's adversarial culture would be ameliorated if only more women were to be elected to public office. Our results suggest, however, that simply increasing women's numbers in parliament is unlikely to make UK politics gentler or more deliberative. The pursuit of a "better" politics requires more than vaguely hoping, on the basis of a dogged adherence to outdated gender stereotypes, that the election of women will fundamentally change the ways that our representatives communicate.

Methodologically, our paper addresses a well-known problem for quantitative text analysis based on dictionaries: the words that demonstrate a given concept in one context may be poorly suited to detecting the use of that concept in another context. We used a word-embedding model to capture how different political styles manifest in the specific setting of parliamentary debate. Results from our validation (in the appendix) show that this approach significantly outperforms existing methods, a finding we believe justifies adoption elsewhere. Our strategy is likely to be useful whenever researchers are interested in measuring a latent concept from a large corpus of texts, but where the domain of interest differs from the domain in which existing dictionaries were developed. This describes a large fraction of applications of dictionary methods, and so our approach has the potential to be applied widely elsewhere.

Finally, we focus only on style as expressed in legislative debates. Style may, of course, manifest in other forms of legislative behaviour, or, indeed, other forms of political speech. While we show that gender gaps in legislative speech have declined, it remains possible that gender stereotypes may still be powerful in other arenas, such as in campaign communication where politicians may be particularly sensitive to voter penalties. We hope that our findings motivate other scholars to explore how gender-based differences in political communication have evolved over time in other contexts. 


\newpage

# References
